Automated Disease Normalization with Low Rank Approximations

نویسندگان

  • Robert Leaman
  • Zhiyong Lu
چکیده

While machine learning methods for named entity recognition (mention-level detection) have become common, machine learning methods have rarely been applied to normalization (concept-level identification). Recent research introduced a machine learning method for normalization based on pairwise learning to rank. This method, DNorm, uses a linear model to score the similarity between mentions and concept names, and has several desirable properties, including learning term variation directly from training data. In this manuscript we employ a dimensionality reduction technique based on low-rank matrix approximation, similar to latent semantic indexing. We compare the performance of the low rank method to previous work, using disease name normalization in the NCBI Disease Corpus as the test case, and demonstrate increased performance as the matrix rank increases. We further demonstrate a significant reduction in the number of parameters to be learned and discuss the implications of this result in the context of algorithm scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Rank Approximations with Sparse Factors I: Basic Algorithms and Error Analysis

We consider the problem of computing low-rank approximations of matrices. The novel aspects of our approach are that we require the low-rank approximations be written in a factorized form with sparse factors and the degree of sparsity of the factors can be traded oo for reduced reconstruction error by certain user determined parameters. We give a detailed error analysis of our proposed algorith...

متن کامل

Subspace Iteration Randomization and Singular Value Problems

A classical problem in matrix computations is the efficient and reliable approximation of a given matrix by a matrix of lower rank. The truncated singular value decomposition (SVD) is known to provide the best such approximation for any given fixed rank. However, the SVD is also known to be very costly to compute. Among the different approaches in the literature for computing low-rank approxima...

متن کامل

Complex Tensors Almost Always Have Best Low-rank Approximations

Low-rank tensor approximations are plagued by a well-known problem — a tensor may fail to have a best rank-r approximation. Over R, it is known that such failures can occur with positive probability, sometimes with certainty: in R2×2×2, every tensor of rank 3 fails to have a best rank-2 approximation. We will show that while such failures still occur over C, they happen with zero probability. I...

متن کامل

Orthogonal Rank-two Tensor Approximation: a Modified High-order Power Method and Its Convergence Analysis

With the notable exceptions that tensors of order 2, that is, matrices always have best approximations of arbitrary low ranks and that tensors of any order always have the best rank-one approximation, it is known that high-order tensors can fail to have best low rank approximations. When the condition of orthogonality is imposed, even at the most general case that only one pair of components in...

متن کامل

Invited session proposal “Low-rank approximation”

Low-rank approximations play an important role in systems theory and signal processing. The problems of model reduction and system identification can be posed and solved as a low-rank approximation problem for structured matrices. On the other hand, signal source separation, nonlinear system identification, and multidimensional signal processing and data analysis can be approached with powerful...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014